AITopics | maxent model

Collaborating Authors

maxent model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient first-order algorithms for large-scale, non-smooth maximum entropy models with application to wildfire science

Langlois, Gabriel P., Buch, Jatan, Darbon, Jérôme

arXiv.org Machine LearningMar-11-2024

Maximum entropy (Maxent) models are a class of statistical models that use the maximum entropy principle to estimate probability distributions from data. Due to the size of modern data sets, Maxent models need efficient optimization algorithms to scale well for big data applications. State-of-the-art algorithms for Maxent models, however, were not originally designed to handle big data sets; these algorithms either rely on technical devices that may yield unreliable numerical results, scale poorly, or require smoothness assumptions that many practical Maxent models lack. In this paper, we present novel optimization algorithms that overcome the shortcomings of state-of-the-art algorithms for training large-scale, non-smooth Maxent models. Our proposed first-order algorithms leverage the Kullback-Leibler divergence to train large-scale and non-smooth Maxent models efficiently. For Maxent models with discrete probability distribution of $n$ elements built from samples, each containing $m$ features, the stepsize parameters estimation and iterations in our algorithms scale on the order of $O(mn)$ operations and can be trivially parallelized. Moreover, the strong $\ell_{1}$ convexity of the Kullback--Leibler divergence allows for larger stepsize parameters, thereby speeding up the convergence rate of our algorithms. To illustrate the efficiency of our novel algorithms, we consider the problem of estimating probabilities of fire occurrences as a function of ecological features in the Western US MTBS-Interagency wildfire data set. Our numerical results show that our algorithms outperform the state of the arts by one order of magnitude and yield results that agree with physical models of wildfire occurrence and previous statistical analyses of wildfire drivers.

algorithm, maxent model, stepsize parameter, (16 more...)

arXiv.org Machine Learning

2403.06816

Country:

North America > United States > New York > New York County > New York City (0.28)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(11 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Government > Regional Government > North America Government > United States Government (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.82)

Add feedback

Revisiting Supertagging for HPSG

Zamaraeva, Olga, Gómez-Rodríguez, Carlos

arXiv.org Artificial IntelligenceSep-14-2023

We present new supertaggers trained on HPSG-based treebanks. These treebanks feature high-quality annotation based on a well-developed linguistic theory and include diverse and challenging test datasets, beyond the usual WSJ section 23 and Wikipedia data. HPSG supertagging has previously relied on MaxEnt-based models. We use SVM and neural CRF- and BERT-based methods and show that both SVM and neural supertaggers achieve considerably higher accuracy compared to the baseline. Our fine-tuned BERT-based tagger achieves 97.26% accuracy on 1000 sentences from WSJ23 and 93.88% on the completely out-of-domain The Cathedral and the Bazaar (cb)). We conclude that it therefore makes sense to integrate these new supertaggers into modern HPSG parsers, and we also hope that the diverse and difficult datasets we used here will gain more popularity in the field. We contribute the complete dataset reformatted for token classification.

accuracy, grammar, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2309.0759

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(7 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

Scalable Dyadic Independence Models with Local and Global Constraints

Adriaens, Florian, Mara, Alexandru, Lijffijt, Jefrey, De Bie, Tijl

arXiv.org Machine LearningFeb-14-2020

An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large networks. By utilizing matrix block-approximation techniques, we propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) models, while being able to meaningfully model local information (degrees) as well as global information (clustering coefficient, assortativity, etc.) if desired. This allows one to efficiently generate random networks with similar properties as an observed network, scalable up to sparse graphs consisting of millions of nodes. Empirical evaluation demonstrates its competitiveness in terms of accuracy with state-of-the-art methods for link prediction and network reconstruction.

constraint, graph, scalable dyadic independence model, (14 more...)

arXiv.org Machine Learning

2002.07076

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Belgium (0.04)

Genre: Research Report (0.84)

Industry: Information Technology (0.69)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

WEST: Word Encoded Sequence Transducers

Variani, Ehsan, Suresh, Ananda Theertha, Weintraub, Mitchel

arXiv.org Machine LearningNov-20-2018

Most of the parameters in large vocabulary models are used in embedding layer to map categorical features to vectors and in softmax layer for classification weights. This is a bottle-neck in memory constraint on-device training applications like federated learning and on-device inference applications like automatic speech recognition (ASR). One way of compressing the embedding and softmax layers is to substitute larger units such as words with smaller sub-units such as characters. However, often the sub-unit models perform poorly compared to the larger unit models. We propose WEST, an algorithm for encoding categorical features and output classes with a sequence of random or domain dependent sub-units and demonstrate that this transduction can lead to significant compression without compromising performance. WEST bridges the gap between larger unit and sub-unit models and can be interpreted as a MaxEnt model over sub-unit features, which can be of independent interest.

arxiv preprint arxiv, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

1811.08417

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.91)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)

Add feedback

Can we use gradient desent method in maximum entropy model?

#artificialintelligenceJul-14-2016, 08:30:37 GMT

I see a lot of implementations use GIS or IIS to train the maximum entropy model. Can we use gradient desent method? If we can use it, why most tutorial directly tell GIS or IIS methos, but do not show the simple gradient desent method to train maximum entropy model? As we know, softmax regression is equivalent to the maxent model, but I never heard GIS or IIS in softmax. Is there a toy code use simple gradient desent method to train maxent model?

artificial intelligence, gradient desent method, machine learning, (4 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.99)

Add feedback

Reduction of Maximum Entropy Models to Hidden Markov Models

Goodman, Joshua

arXiv.org Artificial IntelligenceDec-12-2012

We show that maximum entropy (maxent) models can be modeled with certain kinds of HMMs, allowing us to construct maxent models with hidden variables, hidden state sequences, or other characteristics. The models can be trained using the forward-backward algorithm. While the results are primarily of theoretical interest, unifying apparently unrelated concepts, we also give experimental results for a maxent model with a hidden variable on a word disambiguation task; the model outperforms standard techniques.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1301.057

Country: North America > United States > Washington > King County > Redmond (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)

Add feedback

Explicit probabilistic models for databases and networks

De Bie, Tijl

arXiv.org Artificial IntelligenceJun-29-2009

Recent work in data mining and related areas has highlighted the importance of the statistical assessment of data mining results. Crucial to this endeavour is the choice of a non-trivial null model for the data, to which the found patterns can be contrasted. The most influential null models proposed so far are defined in terms of invariants of the null distribution. Such null models can be used by computation intensive randomization approaches in estimating the statistical significance of data mining results. Here, we introduce a methodology to construct non-trivial probabilistic models based on the maximum entropy (MaxEnt) principle. We show how MaxEnt models allow for the natural incorporation of prior information. Furthermore, they satisfy a number of desirable properties of previously introduced randomization approaches. Lastly, they also have the benefit that they can be represented explicitly. We argue that our approach can be used for a variety of data types. However, for concreteness, we have chosen to demonstrate it in particular for databases and networks.

constraint, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

0906.5148

Country:

North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Bristol (0.04)

Genre: Research Report > Experimental Study (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains

Pavlov, Dmitry Y., Pennock, David M.

Neural Information Processing SystemsDec-31-2003

We develop a maximum entropy (maxent) approach to generating recommendations in the context of a user's current navigation stream, suitable for environments where data is sparse, high-dimensional, and dynamic-- conditions typical of many recommendation applications. We address sparsity and dimensionality reduction by first clustering items based on user access patterns so as to attempt to minimize the apriori probability that recommendations will cross cluster boundaries and then recommending only within clusters. We address the inherent dynamic nature of the problem by explicitly modeling the data as a time series; we show how this representational expressivity fits naturally into a maxent framework. We conduct experiments on data from ResearchIndex, a popular online repository of over 470,000 computer science documents. We show that our maxent formulation outperforms several competing algorithms in offline tests simulating the recommendation of documents to ResearchIndex users.

maxent model, prediction, recommendation, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.61)

Add feedback

A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains

Pavlov, Dmitry Y., Pennock, David M.

Neural Information Processing SystemsDec-31-2003

maxent model, prediction, recommendation, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.61)

Add feedback

A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains

Pavlov, Dmitry Y., Pennock, David M.

Neural Information Processing SystemsDec-31-2003

We develop a maximum entropy (maxent) approach to generating recommendations inthe context of a user's current navigation stream, suitable for environments where data is sparse, high-dimensional, and dynamic-- conditions typical of many recommendation applications. We address sparsity and dimensionality reduction by first clustering items based on user access patterns so as to attempt to minimize the apriori probability thatrecommendations will cross cluster boundaries and then recommending onlywithin clusters. We address the inherent dynamic nature of the problem by explicitly modeling the data as a time series; we show how this representational expressivity fits naturally into a maxent framework.

maxent model, prediction, recommendation, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.62)

Add feedback